Make changes to try to prevent memory bugs in parcel resolver #9885

yamadapc · 2024-08-01T02:38:27Z

This PR adds a copy of the resolver package that moves towards owned data structures rather than using arenas and borrowing.

The crate is copied so that it can be feature-flagged.

Benchmarks have been added which demonstrate the performance profile of the resolver, which is mostly IO bound, and demonstrating that it performs only marginally slower than it previously did avoiding copying/allocation (single digit % diff).

Preliminary end-to-end performance testing on a large application shows no difference to overall build time at this point.

The feature-flag is in place to validate the impact on other metrics, including the frequency of crashes we are experiencing.

…er/parcel into pyamada/safe-resolver-final

crates/node-bindings/src/init_sentry/sentry.rs

mattcompiles

We'll definitely want to clean this up ASAP once we've tested it.

devongovett · 2024-08-01T04:18:03Z

Can you use cargo flags for this instead of forking the whole crate?

yamadapc · 2024-08-01T04:14:49Z

packages/utils/node-resolver-rs/benches/node_resolver_bench.rs

Criterion benchmark results HTML file

These results were taken on a EC2 c7i.8xlarge instance. These vary quite a bit as the test-case is very fast. We should set-up more complex test cases with more files per iteration, to also be benchmarking the caches and so on.

The best nºs we'll get will be from production, when the flag is rolled-out.

Benchmarking FileSystem - check for non existent file using stat (exists): Collecting 100 samples in estimated 5.0021 s (9.3M iteraFileSystem - check for non existent file using stat (exists) time: [531.77 ns 533.04 ns 534.47 ns] change: [-1.1872% -0.9784% -0.7506%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low mild 1 (1.00%) high mild 7 (7.00%) high severe Benchmarking FileSystem - check for non existent file open (read_to_string): Collecting 100 samples in estimated 5.0074 s (3.1M iteFileSystem - check for non existent file open (read_to_string) time: [1.5756 µs 1.5821 µs 1.5890 µs] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe Run safe resolver simple OsFileSystem time: [53.090 µs 53.399 µs 53.688 µs] change: [-1.2921% -0.8435% -0.3762%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe Run safe resolver modules OsFileSystem time: [78.842 µs 80.227 µs 81.887 µs] change: [-2.4456% -1.4033% -0.2646%] (p = 0.01 < 0.05) Change within noise threshold. Found 21 outliers among 100 measurements (21.00%) 14 (14.00%) low mild 1 (1.00%) high mild 6 (6.00%) high severe Run unsafe resolver simple OsFileSystem time: [48.232 µs 48.455 µs 48.658 µs] change: [-1.9564% -1.4375% -0.9269%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) low severe 2 (2.00%) low mild 3 (3.00%) high severe Run unsafe resolver modules OsFileSystem time: [73.793 µs 74.305 µs 74.923 µs] change: [+2.6275% +3.9922% +5.5641%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe Benchmarking Run safe resolver simple - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.0030 s (126k iterationsRun safe resolver simple - PreloadingFileSystem - No IO time: [23.365 µs 23.547 µs 23.760 µs] change: [-0.8148% -0.0801% +0.6690%] (p = 0.84 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe Benchmarking Run safe resolver modules - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.0041 s (106k iterationRun safe resolver modules - PreloadingFileSystem - No IO time: [29.443 µs 29.469 µs 29.498 µs] change: [-2.1236% -1.6713% -1.2740%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 3 (3.00%) high mild 1 (1.00%) high severe Benchmarking Run unsafe resolver simple - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.1314 s (146k iteratioRun unsafe resolver simple - PreloadingFileSystem - No IO time: [18.784 µs 18.800 µs 18.820 µs] change: [-0.2920% -0.1053% +0.0995%] (p = 0.30 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe Benchmarking Run unsafe resolver modules - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.0471 s (121k iteratiRun unsafe resolver modules - PreloadingFileSystem - No IO time: [25.081 µs 25.370 µs 25.792 µs] change: [+0.6886% +1.4525% +2.2694%] (p = 0.00 < 0.05) Change within noise threshold. Found 17 outliers among 100 measurements (17.00%) 9 (9.00%) high mild 8 (8.00%) high severe

These are Apple M1 Pro results:

FileSystem - check for non existent file using stat (exists) time: [674.93 ns 689.44 ns 712.65 ns] Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) high mild 1 (1.00%) high severe FileSystem - check for non existent file open (read_to_string) time: [1.2387 µs 1.2618 µs 1.2946 µs] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe Run safe resolver simple OsFileSystem time: [161.04 µs 163.25 µs 166.15 µs] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe Run safe resolver modules OsFileSystem time: [244.64 µs 249.86 µs 256.41 µs] Found 10 outliers among 100 measurements (10.00%) 8 (8.00%) high mild 2 (2.00%) high severe Run unsafe resolver simple OsFileSystem time: [158.88 µs 161.76 µs 164.91 µs] Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe Run unsafe resolver modules OsFileSystem time: [233.01 µs 237.07 µs 243.17 µs] Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low severe 4 (4.00%) high mild 9 (9.00%) high severe Run safe resolver simple - PreloadingFileSystem - No IO time: [18.144 µs 18.418 µs 18.842 µs] Found 17 outliers among 100 measurements (17.00%) 6 (6.00%) high mild 11 (11.00%) high severe Run safe resolver modules - PreloadingFileSystem - No IO time: [24.043 µs 24.276 µs 24.489 µs] Found 19 outliers among 100 measurements (19.00%) 8 (8.00%) high mild 11 (11.00%) high severe Run unsafe resolver simple - PreloadingFileSystem - No IO time: [13.251 µs 13.273 µs 13.298 µs] Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 1 (1.00%) high mild 10 (10.00%) high severe Run unsafe resolver modules - PreloadingFileSystem - No IO time: [18.525 µs 18.591 µs 18.672 µs] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild

devongovett · 2024-08-01T04:20:46Z

packages/utils/node-resolver-rs/src/invalidations.rs

-  pub invalidate_on_file_create: DashSet<FileCreateInvalidation>,
-  pub invalidate_on_file_change: DashSet<PathBuf>,
+  pub invalidate_on_file_create: RwLock<HashSet<FileCreateInvalidation>>,
+  pub invalidate_on_file_change: RwLock<HashSet<PathBuf>>,


DashSet should be fine. This seems unrelated to the cache change. Putting the entire hashset behind a lock will be much slower.

Hey @devongovett, from what I've measured, this isn't slower ; we won't accept any significant regression.

I'm keen to remove a few of the bits here if possible. DashSet should be fine if we trust it's implementation. But we need to simplify a few parts to improve reliability.

What makes you think this is causing issues? Seems unrelated the potential memory safety problems you mentioned.

Dashmap is a super popular crate. It just shards the map into multiple disjoint maps under individual locks that can be accessed concurrently. I've previously measured improvements on benchmarks by using it. The resolver is quite hot and I did a lot of performance tuning on it.

I was hoping to reduce the surface area of what can break.

As a follow-up or in this diff, I can include more benchmarks that try to replicate the performance gains that you've found. We can commit them onto the criterion suite and I can run them on both macOS and Linux/EC2.

devongovett · 2024-08-01T04:24:05Z

packages/utils/node-resolver-rs/src/specifier.rs

+  Hash(String),
+  Package(String, String),
+  Builtin(String),
+  Url(String),
 }


This also seems quite unrelated from the cache changes.

This is trying to simplify the code.

I understand Cow is here as a performance optimisation, but it doesn't seem like this makes a difference to performance.

That seems unrelated to the safety issues though. Small things like not copying specifiers do add up in my experience. Perhaps it doesn't appear to make a difference right now because the rest of your build is a larger portion of the overall time than the resolver.

Would you know +/- what types of benchmarks you've ran? I would be keen to add some benchmarks we can use to measure this.

Note that on the benchmarks I've posted above, there is between 1-10% difference of the overall time. However, since this is quite fast as is I don't that will translate to a meaningful difference in build time. I'm quite confident we can optimise this much more by optimising IO and JSON parsing.

I've added a few notes on the benchmark suite I added.

I'd need to run them again, it was during development several years ago. Just seems strange to remove optimizations that we already implemented that aren't causing any particular problem.

Our intention indeed isn't to remove any optimisations. We are only trying to measure things. I can add DashMap back quite easily ; but I wanted to measure what allocation will do here.

I was asking what you've benchmarked so we can understand and reproduce your findings. Regardless of whether you run them again, we would want to know what types of benchmarks you're running and what the results are.

Usually end-to-end benchmarks like esbuild's one or https://github.com/web-infra-dev/bundler-benchmark

yamadapc · 2024-08-01T04:29:08Z

Can you use cargo flags for this instead of forking the whole crate?

That'd be better, but if I do that can I switch between the two variants at runtime?

I think it won't make a big difference as the intention is just to do a safer/measured roll-out, then either revert or roll-forward shortly after.

devongovett · 2024-08-01T05:23:03Z

can I switch between the two variants at runtime?

Not runtime but compile time, eg in canary or a dev build. Otherwise the binary size will increase due to having two resolver implementations as well.

yamadapc · 2024-08-01T05:38:48Z

Not runtime but compile time, eg in canary or a dev build. Otherwise the binary size will increase due to having two resolver implementations as well.

@devongovett The reason the crate is copied is that I need to be able to switch at runtime, so that we can roll this out internally measuring the outcomes.

Once we do that, we will have solid nºs on what impact this change has.

devongovett · 2024-08-01T13:23:23Z

Why does it need to be runtime rather than just releasing a different dev/canary version and testing with that?

yamadapc · 2024-08-01T22:20:55Z

Switching at runtime will help us:

run the benchmarks that are already attached to this diff, which run new/old side by side
do a feature rollout safely internally collecting metrics
run other benchmarks over our build, which I'll share results for
makes the change simpler as I just had to copy directories

yamadapc · 2024-08-01T23:05:20Z

This is a critical problem for us at the moment, so I'm merging and iterating.

devongovett · 2024-08-02T05:02:38Z

I think we may need to cut a separate branch for your team to work from so that v2 is always in a stable releasable/auditable state (or as much as possible). It's fine to do experiments or hotfixes in your own projects or in canary/dev builds, but v2 is getting messy and it is blocking stable releases. Seems like you don't really use stable releases anyway given that we haven't done one since February, so if you just had a branch that canary releases were published from and we merged stuff into v2 once it is stable that might be better.

…#9885)"

pyamada-atlassian and others added 20 commits July 31, 2024 01:20

Start moving towards memory-safe resolver

7f1cbb6

Remove all unsafe from resolver

974f4b1

Fix a few broken bits

0fcee25

Remove arenas entirely to avoid dead-locks

22d684d

Fix only test

df460ea

Safe resolver

ab3dfc4

Fix warnings

fdc5a98

Update with optimisation on tsconfig loading

3e4921e

Update with more optimisation

a884630

Update lockfile

b426ddb

Remove copies

8cf4fe4

Merge branch 'pyamada/safe-resolver-final' of github.com:parcel-bundl…

2fa3604

…er/parcel into pyamada/safe-resolver-final

Update with more benchmarks

a5fcf50

Fix compilation

31f5990

Update benchmarks with nodes

fde5ec4

Update with slower / older serde_json5

859d1ab

Don't init tracing

80c5bfe

Prepare to feature-flag resolver change

f380f1e

Feature-flag resolver changes

5de2d1d

Update with deprecation notice

5ac79e2

yamadapc changed the title ~~WIP - Make changes to try to prevent memory bugs in parcel resolver~~ Make changes to try to prevent memory bugs in parcel resolver Aug 1, 2024

yamadapc requested a review from a team August 1, 2024 03:55

Fix warnings

cbae027

mattcompiles reviewed Aug 1, 2024

View reviewed changes

crates/node-bindings/src/init_sentry/sentry.rs Outdated Show resolved Hide resolved

pyamada-atlassian added 5 commits August 1, 2024 04:00

Remove debug log

9ffb9e1

Remove generated entries benchmark test

5f9e9a7

Remove debug statement

ac5ce1d

Run tests with both resolvers

ce715be

Remove unused dependency

858e12a

mattcompiles approved these changes Aug 1, 2024

View reviewed changes

pyamada-atlassian added 3 commits August 1, 2024 04:15

Fix flow error

3bd1053

Fix a few mistakes in JS wrapper

3c2fe22

Fix feature-flag mistakes

40e64df

Fix more flow issues

6c21293

yamadapc commented Aug 1, 2024

View reviewed changes

devongovett reviewed Aug 1, 2024

View reviewed changes

pyamada-atlassian added 3 commits August 1, 2024 04:51

Try to fix windows compilation

c7afef4

Fix wasm compilation

680bd32

Add UTF-8 conversion to fix wasm

ea53fd9

github-actions bot deployed to Preview August 1, 2024 05:20 View deployment

yamadapc merged commit 6f80bf1 into v2 Aug 1, 2024
17 checks passed

yamadapc deleted the pyamada/safe-resolver-final branch August 1, 2024 23:05

devongovett added a commit that referenced this pull request Aug 31, 2024

Revert "Make changes to try to prevent memory bugs in parcel resolver (…

bb6e7ec

…#9885)"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make changes to try to prevent memory bugs in parcel resolver #9885

Make changes to try to prevent memory bugs in parcel resolver #9885

yamadapc commented Aug 1, 2024 •

edited

Loading

mattcompiles left a comment

devongovett commented Aug 1, 2024

yamadapc Aug 1, 2024

yamadapc Aug 1, 2024

devongovett Aug 1, 2024

yamadapc Aug 1, 2024 •

edited

Loading

devongovett Aug 1, 2024

yamadapc Aug 1, 2024

devongovett Aug 1, 2024

yamadapc Aug 1, 2024 •

edited

Loading

devongovett Aug 1, 2024

yamadapc Aug 1, 2024

devongovett Aug 1, 2024

yamadapc Aug 1, 2024 •

edited

Loading

devongovett Aug 2, 2024 •

edited

Loading

yamadapc commented Aug 1, 2024

devongovett commented Aug 1, 2024

yamadapc commented Aug 1, 2024

devongovett commented Aug 1, 2024

yamadapc commented Aug 1, 2024

yamadapc commented Aug 1, 2024

devongovett commented Aug 2, 2024

Make changes to try to prevent memory bugs in parcel resolver #9885

Make changes to try to prevent memory bugs in parcel resolver #9885

Conversation

yamadapc commented Aug 1, 2024 • edited Loading

mattcompiles left a comment

Choose a reason for hiding this comment

devongovett commented Aug 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamadapc Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamadapc Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamadapc Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

devongovett Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

yamadapc commented Aug 1, 2024

devongovett commented Aug 1, 2024

yamadapc commented Aug 1, 2024

devongovett commented Aug 1, 2024

yamadapc commented Aug 1, 2024

yamadapc commented Aug 1, 2024

devongovett commented Aug 2, 2024

yamadapc commented Aug 1, 2024 •

edited

Loading

yamadapc Aug 1, 2024 •

edited

Loading

yamadapc Aug 1, 2024 •

edited

Loading

yamadapc Aug 1, 2024 •

edited

Loading

devongovett Aug 2, 2024 •

edited

Loading